{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "3.6 层级索引  \n",
    "当目前为止，我们接触的都是一维数据和二维数据，用Pandas的Series和DataFrame对象就可以存储。但我们也经常会遇到存储多维数据的需求，数据索引超过一两个键。因此，Pandas提供了Panel和Panel4D对象解决三维数据与四维数据（详情请参见3.7节 ）。而在实践中，更直观的形式是通过层级索引（hierarchical indexing，也被称为多级索引，multi-indexing）配合多个有不同等级（level）的一级索引一起使用，这样就可以将高维数组转换成类似一维Series和二维DataFrame对象的形式。  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "3.6.1 多级索引Series  \n",
    "让我们看看如何用一维的Series对象表示二维数据——用一系列包含特征与数值的数据点来简单演示。  \n",
    "1. 笨办法    \n",
    "2. 好办法：Pandas多级索引  \n",
    "3. 高维数据的多级索引  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "index = [('California', 2000), ('California', 2010),                 \n",
    "('New York', 2000), ('New York', 2010),                 \n",
    "('Texas', 2000), ('Texas', 2010)]        \n",
    "populations = [33871648, 37253956,                       \n",
    "18976457, 19378102,                       \n",
    "20851820, 25145561]        \n",
    "pop = pd.Series(populations, index=index)        \n",
    "pop "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(California, 2000)    33871648\n",
       "(California, 2010)    37253956\n",
       "(New York, 2000)      18976457\n",
       "(New York, 2010)      19378102\n",
       "(Texas, 2000)         20851820\n",
       "(Texas, 2010)         25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 2
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "source": [
    "pop[('California', 2010):('Texas', 2000)]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(California, 2010)    37253956\n",
       "(New York, 2000)      18976457\n",
       "(New York, 2010)      19378102\n",
       "(Texas, 2000)         20851820\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 3
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "pop[[i for i in pop.index if i[1] == 2010]]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(California, 2010)    37253956\n",
       "(New York, 2010)      19378102\n",
       "(Texas, 2010)         25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "index = pd.MultiIndex.from_tuples(index)        \n",
    "index "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "MultiIndex([('California', 2000),\n",
       "            ('California', 2010),\n",
       "            (  'New York', 2000),\n",
       "            (  'New York', 2010),\n",
       "            (     'Texas', 2000),\n",
       "            (     'Texas', 2010)],\n",
       "           )"
      ]
     },
     "metadata": {},
     "execution_count": 5
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "source": [
    "pop = pop.reindex(index)        \n",
    "pop"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "California  2000    33871648\n",
       "            2010    37253956\n",
       "New York    2000    18976457\n",
       "            2010    19378102\n",
       "Texas       2000    20851820\n",
       "            2010    25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 6
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "其中前两列表示Series的多级索引值，第三列是数据。你会发现有些行仿佛缺失了第一列数据——这其实是多级索引的表现形式，每个空格与上面的索引相同。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "source": [
    "pop[:, 2010]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "California    37253956\n",
       "New York      19378102\n",
       "Texas         25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 7
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "source": [
    "pop_df = pop.unstack()        \n",
    "pop_df"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2000</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>33871648</td>\n",
       "      <td>37253956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>18976457</td>\n",
       "      <td>19378102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>20851820</td>\n",
       "      <td>25145561</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                2000      2010\n",
       "California  33871648  37253956\n",
       "New York    18976457  19378102\n",
       "Texas       20851820  25145561"
      ]
     },
     "metadata": {},
     "execution_count": 8
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "source": [
    "pop_df.stack()"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "California  2000    33871648\n",
       "            2010    37253956\n",
       "New York    2000    18976457\n",
       "            2010    19378102\n",
       "Texas       2000    20851820\n",
       "            2010    25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 9
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "source": [
    "pop_df = pd.DataFrame({'total': pop,                               \n",
    "'under18': [9267089, 9284094,                                            \n",
    "4687374, 4318033,                                            \n",
    "5906301, 6879014]}) \n",
    "pop_df"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>total</th>\n",
       "      <th>under18</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">California</th>\n",
       "      <th>2000</th>\n",
       "      <td>33871648</td>\n",
       "      <td>9267089</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>37253956</td>\n",
       "      <td>9284094</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">New York</th>\n",
       "      <th>2000</th>\n",
       "      <td>18976457</td>\n",
       "      <td>4687374</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>19378102</td>\n",
       "      <td>4318033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Texas</th>\n",
       "      <th>2000</th>\n",
       "      <td>20851820</td>\n",
       "      <td>5906301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>25145561</td>\n",
       "      <td>6879014</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    total  under18\n",
       "California 2000  33871648  9267089\n",
       "           2010  37253956  9284094\n",
       "New York   2000  18976457  4687374\n",
       "           2010  19378102  4318033\n",
       "Texas      2000  20851820  5906301\n",
       "           2010  25145561  6879014"
      ]
     },
     "metadata": {},
     "execution_count": 10
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "source": [
    "f_u18 = pop_df['under18'] / pop_df['total']         \n",
    "f_u18.unstack() "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2000</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>0.273594</td>\n",
       "      <td>0.249211</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>0.247010</td>\n",
       "      <td>0.222831</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>0.283251</td>\n",
       "      <td>0.273568</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                2000      2010\n",
       "California  0.273594  0.249211\n",
       "New York    0.247010  0.222831\n",
       "Texas       0.283251  0.273568"
      ]
     },
     "metadata": {},
     "execution_count": 11
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "3.6.2 多级索引的创建方法\n",
    "为Series或DataFrame创建多级索引最直接的办法就是将index参数设置为至少二维的索引数组，如下所示  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "source": [
    "df = pd.DataFrame(np.random.rand(4, 2),                           \n",
    "index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],                           \n",
    "columns=['data1', 'data2'])\n",
    "df"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td>0.779884</td>\n",
       "      <td>0.478319</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.495630</td>\n",
       "      <td>0.734422</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td>0.255050</td>\n",
       "      <td>0.941415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.188611</td>\n",
       "      <td>0.644960</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        data1     data2\n",
       "a 1  0.779884  0.478319\n",
       "  2  0.495630  0.734422\n",
       "b 1  0.255050  0.941415\n",
       "  2  0.188611  0.644960"
      ]
     },
     "metadata": {},
     "execution_count": 12
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "source": [
    "data = {('California', 2000): 33871648,                 \n",
    "('California', 2010): 37253956,                 \n",
    "('Texas', 2000): 20851820,                 \n",
    "('Texas', 2010): 25145561,                 \n",
    "('New York', 2000): 18976457,                 \n",
    "('New York', 2010): 19378102}         \n",
    "pd.Series(data)"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "California  2000    33871648\n",
       "            2010    37253956\n",
       "Texas       2000    20851820\n",
       "            2010    25145561\n",
       "New York    2000    18976457\n",
       "            2010    19378102\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 13
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "1. 显式地创建多级索引  \n",
    "2. 多级索引的等级名称  \n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "source": [
    "pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])  \n"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "MultiIndex([('a', 1),\n",
       "            ('a', 2),\n",
       "            ('b', 1),\n",
       "            ('b', 2)],\n",
       "           )"
      ]
     },
     "metadata": {},
     "execution_count": 14
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "source": [
    "pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "MultiIndex([('a', 1),\n",
       "            ('a', 2),\n",
       "            ('b', 1),\n",
       "            ('b', 2)],\n",
       "           )"
      ]
     },
     "metadata": {},
     "execution_count": 15
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "source": [
    "pd.MultiIndex.from_product([['a', 'b'], [1, 2]])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "MultiIndex([('a', 1),\n",
       "            ('a', 2),\n",
       "            ('b', 1),\n",
       "            ('b', 2)],\n",
       "           )"
      ]
     },
     "metadata": {},
     "execution_count": 16
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "source": [
    "pd.MultiIndex(levels=[['a', 'b'], [1, 2]],                        \n",
    "labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) "
   ],
   "outputs": [
    {
     "output_type": "error",
     "ename": "TypeError",
     "evalue": "__new__() got an unexpected keyword argument 'labels'",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m/tmp/ipykernel_6944/2119095251.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m pd.MultiIndex(levels=[['a', 'b'], [1, 2]],                        \n\u001b[0m\u001b[1;32m      2\u001b[0m labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) \n",
      "\u001b[0;31mTypeError\u001b[0m: __new__() got an unexpected keyword argument 'labels'"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "source": [
    "pop.index.names = ['state', 'year']         \n",
    "pop"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2000    33871648\n",
       "            2010    37253956\n",
       "New York    2000    18976457\n",
       "            2010    19378102\n",
       "Texas       2000    20851820\n",
       "            2010    25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 20
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "3. 多级列索引"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "source": [
    "index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],                                    \n",
    "names=['year', 'visit']) \n",
    "columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],                                     \n",
    "names=['subject', 'type']) \n",
    "\n",
    "data = np.round(np.random.randn(4, 6), 1) \n",
    "data[:, ::2] *= 10 \n",
    "data += 37\n",
    "\n",
    "health_data = pd.DataFrame(data, index=index, columns=columns) \n",
    "health_data"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bob</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Guido</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Sue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>19.0</td>\n",
       "      <td>37.9</td>\n",
       "      <td>49.0</td>\n",
       "      <td>38.6</td>\n",
       "      <td>50.0</td>\n",
       "      <td>36.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>40.0</td>\n",
       "      <td>37.7</td>\n",
       "      <td>47.0</td>\n",
       "      <td>36.5</td>\n",
       "      <td>40.0</td>\n",
       "      <td>36.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2014</th>\n",
       "      <th>1</th>\n",
       "      <td>44.0</td>\n",
       "      <td>37.4</td>\n",
       "      <td>26.0</td>\n",
       "      <td>37.1</td>\n",
       "      <td>48.0</td>\n",
       "      <td>37.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>47.0</td>\n",
       "      <td>37.2</td>\n",
       "      <td>40.0</td>\n",
       "      <td>37.7</td>\n",
       "      <td>41.0</td>\n",
       "      <td>36.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob       Guido         Sue      \n",
       "type          HR  Temp    HR  Temp    HR  Temp\n",
       "year visit                                    \n",
       "2013 1      19.0  37.9  49.0  38.6  50.0  36.6\n",
       "     2      40.0  37.7  47.0  36.5  40.0  36.2\n",
       "2014 1      44.0  37.4  26.0  37.1  48.0  37.4\n",
       "     2      47.0  37.2  40.0  37.7  41.0  36.8"
      ]
     },
     "metadata": {},
     "execution_count": 21
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "source": [
    "health_data['Guido']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>49.0</td>\n",
       "      <td>38.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>47.0</td>\n",
       "      <td>36.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2014</th>\n",
       "      <th>1</th>\n",
       "      <td>26.0</td>\n",
       "      <td>37.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>40.0</td>\n",
       "      <td>37.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "type          HR  Temp\n",
       "year visit            \n",
       "2013 1      49.0  38.6\n",
       "     2      47.0  36.5\n",
       "2014 1      26.0  37.1\n",
       "     2      40.0  37.7"
      ]
     },
     "metadata": {},
     "execution_count": 22
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "3.6.3 多级索引的取值与切片  \n",
    "1. Series多级索引  \n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "source": [
    "pop"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2000    33871648\n",
       "            2010    37253956\n",
       "New York    2000    18976457\n",
       "            2010    19378102\n",
       "Texas       2000    20851820\n",
       "            2010    25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 23
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "source": [
    "pop['California', 2000] "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "33871648"
      ]
     },
     "metadata": {},
     "execution_count": 24
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "MultiIndex也支持局部取值（partial indexing），即只取索引的某一个层级。假如只取最高级的索引，获得的结果是一个新的Series，未被选中的低层索引值会被保留  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "source": [
    "pop['California']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "year\n",
       "2000    33871648\n",
       "2010    37253956\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 25
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "类似的还有局部切片，不过要求MultiIndex是按顺序排列的（就像将在3.6.4节介绍的那样）："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "source": [
    "pop.loc['California':'New York']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2000    33871648\n",
       "            2010    37253956\n",
       "New York    2000    18976457\n",
       "            2010    19378102\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 26
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "source": [
    "pop[:, 2000]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "state\n",
       "California    33871648\n",
       "New York      18976457\n",
       "Texas         20851820\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 27
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "source": [
    "pop[pop > 22000000]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2000    33871648\n",
       "            2010    37253956\n",
       "Texas       2010    25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 28
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "source": [
    "pop[['California', 'Texas']]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "state       year\n",
       "California  2000    33871648\n",
       "            2010    37253956\n",
       "Texas       2000    20851820\n",
       "            2010    25145561\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "execution_count": 29
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "2. DataFrame多级索引"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "source": [
    "health_data"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bob</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Guido</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Sue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>19.0</td>\n",
       "      <td>37.9</td>\n",
       "      <td>49.0</td>\n",
       "      <td>38.6</td>\n",
       "      <td>50.0</td>\n",
       "      <td>36.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>40.0</td>\n",
       "      <td>37.7</td>\n",
       "      <td>47.0</td>\n",
       "      <td>36.5</td>\n",
       "      <td>40.0</td>\n",
       "      <td>36.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2014</th>\n",
       "      <th>1</th>\n",
       "      <td>44.0</td>\n",
       "      <td>37.4</td>\n",
       "      <td>26.0</td>\n",
       "      <td>37.1</td>\n",
       "      <td>48.0</td>\n",
       "      <td>37.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>47.0</td>\n",
       "      <td>37.2</td>\n",
       "      <td>40.0</td>\n",
       "      <td>37.7</td>\n",
       "      <td>41.0</td>\n",
       "      <td>36.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob       Guido         Sue      \n",
       "type          HR  Temp    HR  Temp    HR  Temp\n",
       "year visit                                    \n",
       "2013 1      19.0  37.9  49.0  38.6  50.0  36.6\n",
       "     2      40.0  37.7  47.0  36.5  40.0  36.2\n",
       "2014 1      44.0  37.4  26.0  37.1  48.0  37.4\n",
       "     2      47.0  37.2  40.0  37.7  41.0  36.8"
      ]
     },
     "metadata": {},
     "execution_count": 30
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "由于DataFrame的基本索引是列索引，因此Series中多级索引的用法到了DataFrame中就应用在列上了。例如，可以通过简单的操作获取Guido的心率数据："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "source": [
    "health_data['Guido', 'HR']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "year  visit\n",
       "2013  1        49.0\n",
       "      2        47.0\n",
       "2014  1        26.0\n",
       "      2        40.0\n",
       "Name: (Guido, HR), dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 31
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "source": [
    "health_data.iloc[:2, :2] "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bob</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>19.0</td>\n",
       "      <td>37.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>40.0</td>\n",
       "      <td>37.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob      \n",
       "type          HR  Temp\n",
       "year visit            \n",
       "2013 1      19.0  37.9\n",
       "     2      40.0  37.7"
      ]
     },
     "metadata": {},
     "execution_count": 32
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "source": [
    "health_data.loc[:, ('Bob', 'HR')]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "year  visit\n",
       "2013  1        19.0\n",
       "      2        40.0\n",
       "2014  1        44.0\n",
       "      2        47.0\n",
       "Name: (Bob, HR), dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 33
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "这种索引元组的用法不是很方便，如果在元组中使用切片还会导致语法错误："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "source": [
    "health_data.loc[(:, 1), (:, 'HR')]"
   ],
   "outputs": [
    {
     "output_type": "error",
     "ename": "SyntaxError",
     "evalue": "invalid syntax (3311942670.py, line 1)",
     "traceback": [
      "\u001b[0;36m  File \u001b[0;32m\"/tmp/ipykernel_6944/3311942670.py\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m    health_data.loc[(:, 1), (:, 'HR')]\u001b[0m\n\u001b[0m                     ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "source": [
    "idx = pd.IndexSlice\n",
    "health_data.loc[idx[:, 1], idx[:, 'HR']] "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th>Bob</th>\n",
       "      <th>Guido</th>\n",
       "      <th>Sue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>HR</th>\n",
       "      <th>HR</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <th>1</th>\n",
       "      <td>19.0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>50.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <th>1</th>\n",
       "      <td>44.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>48.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob Guido   Sue\n",
       "type          HR    HR    HR\n",
       "year visit                  \n",
       "2013 1      19.0  49.0  50.0\n",
       "2014 1      44.0  26.0  48.0"
      ]
     },
     "metadata": {},
     "execution_count": 35
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "3.6.4 多级索引行列转换  \n",
    "1. 有序的索引和无序的索引  \n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "source": [
    "index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])         \n",
    "data = pd.Series(np.random.rand(6), index=index)         \n",
    "data.index.names = ['char', 'int']\n",
    "data"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "char  int\n",
       "a     1      0.726773\n",
       "      2      0.803897\n",
       "c     1      0.962141\n",
       "      2      0.633662\n",
       "b     1      0.911545\n",
       "      2      0.850617\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 36
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "source": [
    "try:             \n",
    "    data['a':'b']         \n",
    "except KeyError as e:             \n",
    "    print(type(e))             \n",
    "    print(e)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "<class 'pandas.errors.UnsortedIndexError'>\n",
      "'Key length (1) was greater than MultiIndex lexsort depth (0)'\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "尽管从错误信息里面看不出具体的细节，但问题是出在MultiIndex无序排列上。局部切片和许多其他相似的操作都要求MultiIndex的各级索引是有序的（即按照字典顺序由A至Z）。为此，Pandas提供了许多便捷的操作完成排序，如sort_index()和sortlevel()方法。  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "source": [
    "data = data.sort_index()\n",
    "data"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "char  int\n",
       "a     1      0.726773\n",
       "      2      0.803897\n",
       "b     1      0.911545\n",
       "      2      0.850617\n",
       "c     1      0.962141\n",
       "      2      0.633662\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 38
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "source": [
    "data['a':'b']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "char  int\n",
       "a     1      0.726773\n",
       "      2      0.803897\n",
       "b     1      0.911545\n",
       "      2      0.850617\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 39
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "2. 索引stack与unstack  \n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "source": [
    "pop.unstack(level=0) "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>state</th>\n",
       "      <th>California</th>\n",
       "      <th>New York</th>\n",
       "      <th>Texas</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>33871648</td>\n",
       "      <td>18976457</td>\n",
       "      <td>20851820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>37253956</td>\n",
       "      <td>19378102</td>\n",
       "      <td>25145561</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "state  California  New York     Texas\n",
       "year                                 \n",
       "2000     33871648  18976457  20851820\n",
       "2010     37253956  19378102  25145561"
      ]
     },
     "metadata": {},
     "execution_count": 40
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "source": [
    "pop.unstack(level=1)"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>year</th>\n",
       "      <th>2000</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>33871648</td>\n",
       "      <td>37253956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>18976457</td>\n",
       "      <td>19378102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>20851820</td>\n",
       "      <td>25145561</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "year            2000      2010\n",
       "state                         \n",
       "California  33871648  37253956\n",
       "New York    18976457  19378102\n",
       "Texas       20851820  25145561"
      ]
     },
     "metadata": {},
     "execution_count": 41
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "3. 索引的设置与重置   \n",
    "层级数据维度转换的另一种方法是行列标签转换，可以通过reset_index方法实现。如果在上面的人口数据Series中使用该方法，则会生成一个列标签中包含之前行索引标签state和year的DataFrame。也可以用数据的name属性为列设置名称：  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "source": [
    "pop_flat = pop.reset_index(name='population') \n",
    "pop_flat"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th>year</th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>California</td>\n",
       "      <td>2000</td>\n",
       "      <td>33871648</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>California</td>\n",
       "      <td>2010</td>\n",
       "      <td>37253956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York</td>\n",
       "      <td>2000</td>\n",
       "      <td>18976457</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>New York</td>\n",
       "      <td>2010</td>\n",
       "      <td>19378102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Texas</td>\n",
       "      <td>2000</td>\n",
       "      <td>20851820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Texas</td>\n",
       "      <td>2010</td>\n",
       "      <td>25145561</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        state  year  population\n",
       "0  California  2000    33871648\n",
       "1  California  2010    37253956\n",
       "2    New York  2000    18976457\n",
       "3    New York  2010    19378102\n",
       "4       Texas  2000    20851820\n",
       "5       Texas  2010    25145561"
      ]
     },
     "metadata": {},
     "execution_count": 42
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "在解决实际问题的时候，如果能将类似这样的原始输入数据的列直接转换成MultiIndex，\n",
    "通常将大有裨益。其实可以通过DataFrame的set_index方法实现，返回结果就会是一个带多级索引的DataFrame：  "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "source": [
    "pop_flat.set_index(['state', 'year'])"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">California</th>\n",
       "      <th>2000</th>\n",
       "      <td>33871648</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>37253956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">New York</th>\n",
       "      <th>2000</th>\n",
       "      <td>18976457</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>19378102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Texas</th>\n",
       "      <th>2000</th>\n",
       "      <td>20851820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>25145561</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 population\n",
       "state      year            \n",
       "California 2000    33871648\n",
       "           2010    37253956\n",
       "New York   2000    18976457\n",
       "           2010    19378102\n",
       "Texas      2000    20851820\n",
       "           2010    25145561"
      ]
     },
     "metadata": {},
     "execution_count": 43
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "3.6.5 多级索引的数据累计方法"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "source": [
    "health_data"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>subject</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bob</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Guido</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Sue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th>visit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2013</th>\n",
       "      <th>1</th>\n",
       "      <td>19.0</td>\n",
       "      <td>37.9</td>\n",
       "      <td>49.0</td>\n",
       "      <td>38.6</td>\n",
       "      <td>50.0</td>\n",
       "      <td>36.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>40.0</td>\n",
       "      <td>37.7</td>\n",
       "      <td>47.0</td>\n",
       "      <td>36.5</td>\n",
       "      <td>40.0</td>\n",
       "      <td>36.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2014</th>\n",
       "      <th>1</th>\n",
       "      <td>44.0</td>\n",
       "      <td>37.4</td>\n",
       "      <td>26.0</td>\n",
       "      <td>37.1</td>\n",
       "      <td>48.0</td>\n",
       "      <td>37.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>47.0</td>\n",
       "      <td>37.2</td>\n",
       "      <td>40.0</td>\n",
       "      <td>37.7</td>\n",
       "      <td>41.0</td>\n",
       "      <td>36.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject      Bob       Guido         Sue      \n",
       "type          HR  Temp    HR  Temp    HR  Temp\n",
       "year visit                                    \n",
       "2013 1      19.0  37.9  49.0  38.6  50.0  36.6\n",
       "     2      40.0  37.7  47.0  36.5  40.0  36.2\n",
       "2014 1      44.0  37.4  26.0  37.1  48.0  37.4\n",
       "     2      47.0  37.2  40.0  37.7  41.0  36.8"
      ]
     },
     "metadata": {},
     "execution_count": 44
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "source": [
    "data_mean = health_data.mean(level='year') \n",
    "data_mean"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "/tmp/ipykernel_6944/912545124.py:1: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.median(level=1) should use df.groupby(level=1).median().\n",
      "  data_mean = health_data.mean(level='year')\n"
     ]
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th>subject</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bob</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Guido</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Sue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>29.5</td>\n",
       "      <td>37.8</td>\n",
       "      <td>48.0</td>\n",
       "      <td>37.55</td>\n",
       "      <td>45.0</td>\n",
       "      <td>36.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>45.5</td>\n",
       "      <td>37.3</td>\n",
       "      <td>33.0</td>\n",
       "      <td>37.40</td>\n",
       "      <td>44.5</td>\n",
       "      <td>37.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subject   Bob       Guido          Sue      \n",
       "type       HR  Temp    HR   Temp    HR  Temp\n",
       "year                                        \n",
       "2013     29.5  37.8  48.0  37.55  45.0  36.4\n",
       "2014     45.5  37.3  33.0  37.40  44.5  37.1"
      ]
     },
     "metadata": {},
     "execution_count": 45
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "如果再设置axis参数，就可以对列索引进行类似的累计操作了："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "source": [
    "data_mean.mean(axis=1, level='type') "
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "/tmp/ipykernel_6944/1720862384.py:1: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.median(level=1) should use df.groupby(level=1).median().\n",
      "  data_mean.mean(axis=1, level='type')\n"
     ]
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>type</th>\n",
       "      <th>HR</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>40.833333</td>\n",
       "      <td>37.250000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>41.000000</td>\n",
       "      <td>37.266667</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "type         HR       Temp\n",
       "year                      \n",
       "2013  40.833333  37.250000\n",
       "2014  41.000000  37.266667"
      ]
     },
     "metadata": {},
     "execution_count": 46
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "![](2021-11-16-21-48-14.png)"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [],
   "metadata": {}
  }
 ],
 "metadata": {
  "orig_nbformat": 4,
  "language_info": {
   "name": "python",
   "version": "3.8.10",
   "mimetype": "text/x-python",
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "pygments_lexer": "ipython3",
   "nbconvert_exporter": "python",
   "file_extension": ".py"
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.8.10 64-bit ('d2l-zh': conda)"
  },
  "interpreter": {
   "hash": "0bedf3452ed48e38a2b75ff9f66ce8ae9ae2e92e8665cb618ee0797d2f46b9e5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}